一个多世纪以前,伊万·P·帕夫洛夫(Ivan P. Pavlov)在经典实验中展示了狗如何学会将铃铛与食物联系起来,从而导致戒指导致唾液。如今,很少发现使用Pavlovian类型的关联学习用于人工智能(AI)应用程序,即使其他学习概念,尤其是对人工神经网络(ANN)的反向传播也蓬勃发展。但是,使用反向传播方法的训练在“常规” ANN上,尤其是现代深神经网络(DNNS)的形式,是计算和能量密集型的。在这里,我们在实验上展示了使用单个(或单一)关联硬件元素的无反向传播学习形式。我们使用相位变换材料与芯片级联方向耦合器相结合的集成光子平台上意识到这一点。然后,我们使用我们的Monadic Pavlovian光子硬件开发扩展的电路网络,该硬件可以基于单元素关联提供独特的机器学习框架,并且重要的是,重要的是,使用无反向传播的架构来解决一般学习任务。我们的方法通过在传统的神经网络方法中学习来减轻施加的计算负担,从而提高了速度,同时还提供了我们光子实现固有的更高带宽。
translated by 谷歌翻译
Attention mechanisms form a core component of several successful deep learning architectures, and are based on one key idea: ''The output depends only on a small (but unknown) segment of the input.'' In several practical applications like image captioning and language translation, this is mostly true. In trained models with an attention mechanism, the outputs of an intermediate module that encodes the segment of input responsible for the output is often used as a way to peek into the `reasoning` of the network. We make such a notion more precise for a variant of the classification problem that we term selective dependence classification (SDC) when used with attention model architectures. Under such a setting, we demonstrate various error modes where an attention model can be accurate but fail to be interpretable, and show that such models do occur as a result of training. We illustrate various situations that can accentuate and mitigate this behaviour. Finally, we use our objective definition of interpretability for SDC tasks to evaluate a few attention model learning algorithms designed to encourage sparsity and demonstrate that these algorithms help improve interpretability.
translated by 谷歌翻译
Continuous-time Markov chains are used to model stochastic systems where transitions can occur at irregular times, e.g., birth-death processes, chemical reaction networks, population dynamics, and gene regulatory networks. We develop a method to learn a continuous-time Markov chain's transition rate functions from fully observed time series. In contrast with existing methods, our method allows for transition rates to depend nonlinearly on both state variables and external covariates. The Gillespie algorithm is used to generate trajectories of stochastic systems where propensity functions (reaction rates) are known. Our method can be viewed as the inverse: given trajectories of a stochastic reaction network, we generate estimates of the propensity functions. While previous methods used linear or log-linear methods to link transition rates to covariates, we use neural networks, increasing the capacity and potential accuracy of learned models. In the chemical context, this enables the method to learn propensity functions from non-mass-action kinetics. We test our method with synthetic data generated from a variety of systems with known transition rates. We show that our method learns these transition rates with considerably more accuracy than log-linear methods, in terms of mean absolute error between ground truth and predicted transition rates. We also demonstrate an application of our methods to open-loop control of a continuous-time Markov chain.
translated by 谷歌翻译
This paper focuses on a stochastic system identification problem: given time series observations of a stochastic differential equation (SDE) driven by L\'{e}vy $\alpha$-stable noise, estimate the SDE's drift field. For $\alpha$ in the interval $[1,2)$, the noise is heavy-tailed, leading to computational difficulties for methods that compute transition densities and/or likelihoods in physical space. We propose a Fourier space approach that centers on computing time-dependent characteristic functions, i.e., Fourier transforms of time-dependent densities. Parameterizing the unknown drift field using Fourier series, we formulate a loss consisting of the squared error between predicted and empirical characteristic functions. We minimize this loss with gradients computed via the adjoint method. For a variety of one- and two-dimensional problems, we demonstrate that this method is capable of learning drift fields in qualitative and/or quantitative agreement with ground truth fields.
translated by 谷歌翻译
The dichotomy between the challenging nature of obtaining annotations for activities, and the more straightforward nature of data collection from wearables, has resulted in significant interest in the development of techniques that utilize large quantities of unlabeled data for learning representations. Contrastive Predictive Coding (CPC) is one such method, learning effective representations by leveraging properties of time-series data to setup a contrastive future timestep prediction task. In this work, we propose enhancements to CPC, by systematically investigating the encoder architecture, the aggregator network, and the future timestep prediction, resulting in a fully convolutional architecture, thereby improving parallelizability. Across sensor positions and activities, our method shows substantial improvements on four of six target datasets, demonstrating its ability to empower a wide range of application scenarios. Further, in the presence of very limited labeled data, our technique significantly outperforms both supervised and self-supervised baselines, positively impacting situations where collecting only a few seconds of labeled data may be possible. This is promising, as CPC does not require specialized data transformations or reconstructions for learning effective representations.
translated by 谷歌翻译
To properly assist humans in their needs, human activity recognition (HAR) systems need the ability to fuse information from multiple modalities. Our hypothesis is that multimodal sensors, visual and non-visual tend to provide complementary information, addressing the limitations of other modalities. In this work, we propose a multi-modal framework that learns to effectively combine features from RGB Video and IMU sensors, and show its robustness for MMAct and UTD-MHAD datasets. Our model is trained in two-stage, where in the first stage, each input encoder learns to effectively extract features, and in the second stage, learns to combine these individual features. We show significant improvements of 22% and 11% compared to video only and IMU only setup on UTD-MHAD dataset, and 20% and 12% on MMAct datasets. Through extensive experimentation, we show the robustness of our model on zero shot setting, and limited annotated data setting. We further compare with state-of-the-art methods that use more input modalities and show that our method outperforms significantly on the more difficult MMact dataset, and performs comparably in UTD-MHAD dataset.
translated by 谷歌翻译
复杂的多目标任务需要在多个相互连接的级别(例如联盟形成,调度和运动计划)上协调异质机器人。动态变化(例如传感器和执行器故障,通信损失和意外延迟)加剧了这一挑战。我们将动态迭代任务分配图搜索(D-ITAGS)介绍到\ textit {同时}地址在涉及异构团队的动态设置中,地址为联盟组建,调度和运动计划。 D-Itag通过两个关键特征实现弹性:i)交错执行,ii)有针对性的维修。 \ textIt {交错执行}可以在每一层进行有效搜索解决方案,同时避免与其他层不兼容。 \ textIt {目标修复}识别并修复了现有解决方案的一部分,该解决方案在保存其余部分的同时受到给定破坏的影响。除了算法贡献外,我们还提供理论上的见解,以了解这些设置中时间和资源最优性之间固有的权衡,并在计划次级临时性上得出有意义的界限。我们的实验表明,在动态设置中,i)d-itag的速度明显比从头开始的重新计算要快得多,而溶液质量几乎没有损失,ii)理论次优界在实践中始终保持。
translated by 谷歌翻译
文章预测是一项长期以来一直不准确的语言描述的任务。因此,这项任务非常适合评估模型模拟本人说话者直觉的能力。为此,我们将英语英语的人和预先培训的模型的性能与文章预测的任务进行了比较,将其设置为三道选择(a/an,the,零)。我们对伯特(Bert)的实验表明,伯特(Bert)在所有文章中都超越了人类。特别是,伯特(Bert)在检测零文章时远远优于人类,这可能是因为我们使用深层神经模型可以轻松拾取的规则插入它们。更有趣的是,我们发现,当通知者协议较高时,伯特倾向于与注释者更加同意注释者,而与语料库相比,但由于通知者协议下降,因此与语料库的同意更多。我们认为,尽管接受了语料库的培训,但与注释者的这种对齐方式表明,伯特没有记住文章的使用,而是捕获了与人类直觉相似的文章的高级概括。
translated by 谷歌翻译
本地解释性方法 - 由于需要从业者将其模型输出合理化,因此寻求为每次预测产生解释的人越来越普遍。然而,比较本地解释性方法很难,因为它们每个都会在各种尺度和尺寸中产生输出。此外,由于一些可解释性方法的随机性质,可以不同地运行方法以产生给定观察的矛盾解释。在本文中,我们提出了一种基于拓扑的框架来从一组本地解释中提取简化的表示。我们通过首先为标量函数设计解释空间和模型预测之间的关系来实现。然后,我们计算这个功能的拓扑骨架。这种拓扑骨架作为这样的功能的签名,我们用于比较不同的解释方法。我们证明我们的框架不仅可以可靠地识别可解释性技术之间的差异,而且提供稳定的表示。然后,我们展示了我们的框架如何用于标识本地解释性方法的适当参数。我们的框架很简单,不需要复杂的优化,并且可以广泛应用于大多数本地解释方法。我们认为,我们的方法的实用性和多功能性将有助于促进基于拓扑的方法作为理解和比较解释方法的工具。
translated by 谷歌翻译
我们开发方法以在一个空间尺寸中学习时间依赖的Kohn-Sham(TDK)系统的相关电位。我们从一条低维两电子系统开始,我们可以在数值上解决时间依赖的SCHR \“odinger方程;这产生适用于培训相关潜力的培训模型的电子密度。我们将学习问题框架作为优化的一个对动态遵守TDKS方程的限制的限制最小二乘目标。应用伴侣,我们开发有效的方法来计算梯度,从而学习相关势的模型。我们的结果表明可以学习相关潜力的值所得到的电子密度匹配地面真理密度。我们还展示了如何使用内存学习相关潜在功能,演示一个这样的模型,为训练集外的轨迹产生合理的结果。
translated by 谷歌翻译